Introduction to Ambisonics¶

Fundamentals¶

Chris Hold, Virtual Acoustics (Aalto University)

Structure¶

  • Motivation
  • Discovering Spherical Harmonics
  • Ambisonics Encoder and Decoder

From Recording to Playback¶

Setup2Playback

Audio Description Formats¶

Channel Based Object Based Scene Based
Channel Object Scene

Scene Based Workflow¶

Channel

Benefits of Scene-based¶

  • Separation of recording, transmission / storage, and playback
  • Flexible with multiple options for each stage
  • Does not scale with the number of sources

Ambisonics¶

  • Implementation of a scene based format
  • "Long" history and connection to other sciences
  • Use spherical harmonics as basis functions

Example¶

No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

From the Wave Equation to Spherical Harmonics¶

Wave equation \begin{equation} \nabla^2 p - \frac{1}{c^2} \frac{\partial^2 p}{\partial t^2} = 0 \quad. \end{equation}

In frequency domain, with wave-number $k = \frac{\omega}{c}$ results in the Helmholtz equation \begin{equation} (\nabla^2 + k^2) p = 0 \quad. \end{equation}

Multiple solutions, e.g. a mono chromatic plane wave with amplitude $\hat{A}(\omega)$ in Cartesian coordinates \begin{equation} p(t, x, y, z) = \hat{A}(\omega) e^{i(k_x x + k_y y + k_z z - \omega t)} \quad. \end{equation}

No description has been provided for this image

Any solution to the Helmholtz equation can also be expressed in spherical coordinates \begin{equation} p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} \color{olive}{(A_{mn} j_n(kr) + B_{mn} y_n(kr))} \; \color{brown}{Y_{n}^{m}(\theta,\phi)} \quad, \end{equation}

\begin{equation} p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} \color{olive}{C_{mn}(kr)} \; \color{brown}{Y_{n}^{m}(\theta,\phi)} \quad, \end{equation}

With two separable parts:

  • radial component : linear combination of spherical Bessel functions of first ($j_n$) and second kind $y_n$
  • angular component : spherical harmonics $\color{brown}{Y_{n}^{m}(\theta,\phi)}$

sound field defined on a sphere is fully captured by its spherical harmonics coefficients

E.g. a unit plane wave \begin{equation} p(r, \theta, \phi, k) = \sum_{n = 0}^{\infty} \sum_{m=-n}^{n} 4 \pi i^n j_n(kr) \left[ Y_n^m(\theta_k,\phi_k) \right] ^* Y_{n}^ {m}(\theta,\phi) \quad. \end{equation}

Discovering Spherical Harmonics¶

\begin{equation} Y_n^m(\theta,\phi)=\color{orange}{\sqrt{\frac{2n+1}{4\pi}\frac{(n-m)!}{(n+m)!}}}\, \color{teal}{P_n^m(\cos(\theta))}\, \color{purple}{e^{im\phi}} \quad, \end{equation}\begin{equation} Y_n^m(\theta,\phi)=\color{orange}{D_{nm}}\, \color{teal}{P_n^m(\cos(\theta))}\, \color{purple}{e^{im\phi}} \quad, \end{equation}

where

  • azimuth as $\phi$ and zenith/colatitude as $\theta$.
  • $P_n^m$ is the associated Legendre polynomial of order $n$ and degree $m$.

Combined with appropriate scaling give real Spherical Harmonics $Y_{n,m}(\theta,\phi)$ as

\begin{equation} Y_{n,m}(\theta,\phi) = \sqrt{ \frac{(2n+1)}{4\pi} \frac{(n-|m|)!}{(n+|m|)!} } P_{n,|m|}(\cos(\theta)) \begin{cases} \sqrt2\sin(|m|\phi) & \mathrm{if\hspace{0.5em}} m < 0 \quad,\\ 1 & \mathrm{if\hspace{0.5em}} m = 0 \quad,\\ \sqrt2\cos(|m|\phi) & \mathrm{if\hspace{0.5em}} m > 0 \quad. \end{cases} \end{equation}
  • Always check: Condon–Shortley phase convention $(-1)^m$
No description has been provided for this image

Let's look at the azimuthal component $e^{im\phi}$ and the zenithal component $P_n^m(\cos\theta)$

azimuthal component¶

No description has been provided for this image
No description has been provided for this image

zenithal component¶

No description has been provided for this image

Orthonormality¶

Two functions $\color{violet}f, \color{purple}g$ over a domain $\gamma$ are orthogonal if

\begin{equation} \int_\gamma \color{violet}{f^*(\gamma)} \color{purple}{g(\gamma)} \,\mathrm{d}\gamma = \langle \color{violet}f, \color{purple}g \rangle = 0 \quad,\,\mathrm{for}~f \neq g. \end{equation}

They are also orthonormal if \begin{equation} \color{violet}{\int_\gamma f^*(\gamma) f(\gamma) \,\mathrm{d}\gamma = \int_\gamma|f(\gamma)|^2 \,\mathrm{d}\gamma = \langle f, f \rangle } = 1 \quad, \\ \color{purple}{\int_\gamma g^*(\gamma) g(\gamma) \,\mathrm{d}\gamma = \int_\gamma |g(\gamma)|^2 \,\mathrm{d}\gamma = \langle g, g \rangle} = 1 \quad. \end{equation}

For the Spherical Harmonics:

  • the the azimuthal component $e^{im\phi}$ along $\phi$ is orthogonal w.r.t. the degree $m$
  • the zenithal component $P_n^m(\cos\theta)$ along $\theta$ is orthogonal w.r.t. the order $n$.

Their product is still orthogonal, and the scaling $\color{orange}{D_{nm}}$ ensures orthonormality such that \begin{equation} \int_\Omega Y_n^m(\Omega)^* \, Y_{n'}^{m'}(\Omega) \,\mathrm{d}\Omega = \langle Y_n^m(\Omega) , Y_{n'}^{m'}(\Omega) \rangle = \delta_{nn'}\delta_{mm'} \quad , \end{equation}

and

\begin{equation} \int_{{\Omega} \in \mathbb{S}^2} |Y_n^m({\Omega})|^2 \mathrm{d}{\Omega} = 1 \quad . \end{equation}

Example¶

show normality for $Y_0^0$: $$ Y_0^0(\theta,\phi)=\sqrt{\frac{0+1}{4\pi}\frac{(0)!}{(0)!}} P_0^0(\cos(\theta)) e^{i0\phi} = \sqrt{\frac{1}{4\pi}} \quad, $$ hence $$ \int_{{\Omega} \in \mathbb{S}^2} Y_0^0(\theta,\phi)^* Y_0^0(\theta,\phi) \,\mathrm{d}{\Omega} = \int_{{\Omega} \in \mathbb{S}^2} \sqrt{\frac{1}{4\pi}}\sqrt{\frac{1}{4\pi}} \, \mathrm{d}{\Omega} = 4\pi \frac{1}{4\pi} = 1 \quad. $$

Example¶

Test orthogonality of functions $\color{violet}f, \color{purple}g$

$$ \langle \color{violet}f, \color{purple}g \rangle \overset{?}{=} 0 \quad,\,\mathrm{for}~f \neq g. $$
Test for orthogonality in azimuth <f_azi, g_azi> =  -0.0
Test for orthogonality in zenith <f_zen, g_zen> =  0.0
No description has been provided for this image

Spherical Harmonic Transform (SHT)¶

  • Find a scene based encoding for sound field
  • We showed that the spherical harmonics $Y_n^m({\Omega})$ are a set of suitable basis functions on the sphere
  • We also showed that a sound field (on the sphere) $s({\Omega})$ is fully captured by its spherical harmonics coefficients $\sigma_{nm}$

This can be expressed with $\Omega = [\phi, \theta]$ as the inverse Spherical Harmonic Transform (iSHT) \begin{equation} s({\Omega}) = \sum_{n = 0}^{N=\infty} \sum_{m=-n}^{+n} \sigma_{nm} Y_n^m({\Omega}) \quad. \end{equation}

Spherical harmonics coefficients $\sigma_{nm}$ can be derived with the Spherical Harmonic Transform (SHT) \begin{equation} \sigma_{nm} = \int_{{\Omega} \in \mathbb{S}^2} s({\Omega}) [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} = \langle [Y_n^m({\Omega})] , s({\Omega}) \rangle \quad. \end{equation}

  • SHT also refered to as spherical Fourier transform
  • Note the transform from a spatially discrete to continuous domain

Spherical Grids¶

The SHT evaluates the continuous integral over $\Omega$ \begin{equation} \sigma_{nm} = \int_{{\Omega} \in \mathbb{S}^2} s({\Omega}) [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} \quad . \end{equation} Quadrature methods allow evaluation by spherical sampling at certain (weighted) grid points such that \begin{equation} \sigma_{nm} \approx \sum_{q=1}^{Q} w_q s({\Omega}_q) [Y_n^m({\Omega}_q)]^* \quad. \end{equation}

Certain grids with sampling points ${\Omega}_q$ and associated sampling weights $w_q$ have certain properties:

  • quadrature grids allow numerical integration of spherical polynomials
  • Spherical sampling dictates maximum order
  • easy to evaluate are uniform/regular grids, with constant $w_q = \frac{4\pi}{Q}$
  • An example are so-called spherical t-designs(t), which allow a SHT up to order $N = \lfloor t/2 \rfloor$
No description has been provided for this image
No description has been provided for this image

Spherical Dirac¶

We can show that spherical harmonics are orthogonal (even orthonormal) with \begin{equation} \int_\Omega Y_n^m(\Omega) \, Y_{n'}^{m'}(\Omega) \,\mathrm{d}\Omega = \delta_{nn'}\delta_{mm'} \quad . \end{equation}

Because of their completeness, we can also directly formulate a Dirac function on the sphere as \begin{equation} \sum_{n=0}^{N=\infty} \sum_{m=-n}^n [Y_n^m({\Omega'})]^* Y_n^m(\Omega) = \delta(\Omega - \Omega') \quad, \end{equation}

and therefore the spherical Fourier coefficients $\sigma_{nm}$ \begin{equation} SHT\{\delta(\Omega - \Omega')\} = \int_{{\Omega} \in \mathbb{S}^2} \delta(\Omega - \Omega') \, [Y_n^m({\Omega})]^* \mathrm{d}{\Omega} = [Y_n^m({\Omega'})]^* \quad . \end{equation}

Order-Limitation of Spherical Dirac Pulse¶

No description has been provided for this image

Example¶

Integrate (order-limited) Dirac $\delta_N(\Omega - \Omega')$ over sphere

$$ \int_{{\Omega} \in \mathbb{S}^2} \delta_N(\Omega - \Omega') \mathrm{d}{\Omega} = \int_{{\Omega} \in \mathbb{S}^2} \color{blue}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega)} \,\mathrm{d}{\Omega} \quad, $$

by discretization with sufficient t-design $$ \int_{{\Omega} \in \mathbb{S}^2} \color{blue}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega)} \,\mathrm{d}{\Omega} = \frac{4\pi}{Q}\sum_{q=1}^{Q} \color{orange}{\sum_{n=0}^{N} \sum_{m=-n}^n\, [Y_n^m({\Omega'})]^* Y_n^m(\Omega_q)} $$

No description has been provided for this image

Matrix Notations¶

Stack the spherical harmonics evaluated at $\Omega$ up to spherical order $N$ as

$$ \mathbf{Y} = \left[ \begin{array}{ccccc} Y_0^0(\Omega[0]) & Y_1^{-1}(\Omega[0]) & Y_1^0(\Omega[0]) & \dots & Y_N^N(\Omega[0]) \\ Y_0^0(\Omega[1]) & Y_1^{-1}(\Omega[1]) & Y_1^0(\Omega[1]) & \dots & Y_N^N(\Omega[1]) \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ Y_0^0(\Omega[Q-1]) & Y_1^{-1}(\Omega[Q-1]) & Y_1^0(\Omega[Q-1]) & \dots & Y_N^N(\Omega[Q-1]) \end{array} \right] $$

such that $\mathbf{Y} : [Q, (N+1)^2]$

  • soundfield pressure is real signal -> only real spherical harmonics needed
  • orthonormal scaling introduced earlier called N3D by convention (in contrast to SN3D)
  • stacking them as above, where $idx_{n,m} = n^2+n+m$, is called ACN
  • We can stack a time signal $s(t_0, t_1, \ldots, t_T)$ as vector $\mathbf{s}$.
\begin{equation*} \mathbf{s} = \begin{bmatrix} s(t_0) \\ s(t_1) \\ \vdots \\ s(t_T) \end{bmatrix} \end{equation*}
  • This leads to a matrix notation of multiple discrete signals $\mathbf{S} : [T, Q]$.
\begin{equation*} \mathbf{S} = \begin{bmatrix} s_0(t_0) & s_1(t_0) & \cdots & s_Q(t_0)\\ s_0(t_1) & s_1(t_1) & \cdots & s_Q(t_1)\\ \vdots & \vdots& \ddots & \vdots \\ s_0(t_T) & s_1(t_T) & \cdots & s_Q(t_T) \end{bmatrix} \end{equation*}
  • Similarly, stacking Ambisonics coefficients $\sigma_n^m(t_0, t_1, \ldots, t_T)$ into $\mathbf{\chi} : [T, (N+1)^2]$
\begin{equation*} \mathbf{\chi} = \begin{bmatrix} \sigma_0^0(t_0) & \sigma_1^{-1}(t_0) & \cdots & \sigma_N^N(t_0) \\ \sigma_0^0(t_1) & \sigma_1^{-1}(t_1) & \cdots & \sigma_N^N(t_1) \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_0^0(t_T) & \sigma_1^{-1}(t_T) & \cdots & \sigma_N^N(t_T) \end{bmatrix} \end{equation*}

We obtain ambisonic signals matrix $\mathbf{\chi} : [T, (N+1)^2]$ from signals $\mathbf{S} : [T, Q]$ by SHT in matrix notation as \begin{equation} \mathbf{\chi}(t) = \mathbf{S}(t) \, \mathrm{diag}(w_q) \, \mathbf{Y} \quad. \end{equation} With the SH basis functions evaluated at $\Omega_q$ as $\mathbf{Y} : [Q, (N+1)^2]$.

Obtaining the discrete signals $s_q(t)$ is a linear combination of SH basis functions evaluated at $\Omega_q$ as $\mathbf{Y} : [Q, (N+1)^2]$ and the SH coefficients. This inverse SHT in matrix notation with ambisonic signals matrix $\mathbf{\chi} : [T, (N+1)^2]$ is

\begin{equation} s_q(t) = \mathbf{\chi}(t) \, \mathbf{Y}^T \quad . \end{equation}

From the Ambisonics Encoder, to Directional Weighting, to a Decoder

Encoder¶

A single plane wave encoded in direction $\Omega_1$ with signal $\mathbf{s}$ is directly the outer product with the Dirac SH coefficients

\begin{equation} \mathbf{\chi}_\mathrm{PW(\Omega_1)} = \mathbf{s} \, \mathbf{Y}(\Omega_1) \quad. \end{equation}

For multiple sources $Q$, we stack and sum \begin{equation} \mathbf{\chi}_\mathrm{PW(\Omega_Q)} = \sum_{q=1}^Q\mathbf{s}_q \, \mathbf{Y}(\Omega_q) = \mathbf{S} \, \mathbf{Y} \quad. \end{equation}

No description has been provided for this image
No description has been provided for this image

Directional Weighting¶

  • Directionally filtering a soundfield in direction $\Omega_k$ by weighting
  • Weighting in SH domain $w_{nm}$ is an elegant way of beamforming

The simplest beamformer is a spherical Dirac in direction $\Omega_k$ normalized by its energy, i.e. $\mathrm{max}DI$ \begin{equation} w_{nm, \mathrm{maxDI}}(\Omega_k) = \frac{4\pi}{(N+1)^2} Y_{n,m}(\Omega_k) \end{equation}

No description has been provided for this image

Other patterns can be achieved by weighting the spherical Fourier spectrum. Axis-symmetric patterns reduce to only a modal weighting $c_n$, such that \begin{equation} w_{nm}(\Omega_k) = c_{n} \, Y_{n,m}(\Omega_k) \end{equation}

E.g. $\mathrm{max}\vec{r}_E$ weights each order $n$ with \begin{equation} c_{n,\,\mathrm{max}\vec{r}_E} = P_n[\cos(\frac{137.9^\circ}{N+1.51})] \quad, \end{equation} with the Legendre polynomials $P_n$ of order $n$.

No description has been provided for this image

Or we can define a (SH) Butterworth filter with \begin{equation} c_{n,\,\mathrm{Butterworth}} = \frac{1}{\sqrt{1+(n/n_c)^{2k}}} \quad, \end{equation} with the filter-order $k$ and cut-on SH-order $n_c$.

No description has been provided for this image

Decoder¶

  • Beamformer extracts a portion of the soundfield in steering direction $\Omega$
  • Pointing a set of beamformers in directions $\Omega_k$ results in a set of signal components
  • in case of $\mathrm{max}DI$ proportional to iSHT in $\Omega_k$
\begin{equation} s({\Omega_k}) = \sum_{n = 0}^{N} \sum_{m=-n}^{+n} w_{nm}({\Omega_k}) \, \sigma_{nm} \quad, \end{equation}

or in matrix notation with $\mathbf{S}$ and beamforming weights $\mathbf{c}_n$ and $\mathbf{Y}$ \begin{equation} \mathbf{S} = \mathbf{\chi} \, \mathrm{diag_N}(\mathbf{c}_n) \, \mathbf{Y}^T \quad . \end{equation}

Example¶

SHD signal of 3 sine signal PWs plus noise -> three beamformers, two pointing to sources

No description has been provided for this image
No description has been provided for this image

Example¶

Decode on a t-design(6) (sufficient up to $N = 3$):

  • a 3rd order ambisonic signal
No description has been provided for this image

Example¶

Decode on a t-design(6) (sufficient up to $N = 3$):

  • a 5th order ambisonic signal
  • an 8th order ambisonic signal
No description has been provided for this image
No description has been provided for this image

Loudspeaker Decoders¶

In [25]:
# Loudspeaker Setup
ls_dirs = np.array([[-135, -80, -45, 0, 45, 80, 135, -135, -60, -30, 30, 60, 135],
                    [0, 0, 0, 0, 0, 0, 0, 60, 60, 60, 60, 60, 60]])
ls_x, ls_y, ls_z = spa.utils.sph2cart(spa.utils.deg2rad(ls_dirs[0, :]),
                                      spa.utils.deg2rad(90 - ls_dirs[1, :]))
ls_setup = spa.decoder.LoudspeakerSetup(ls_x, ls_y, ls_z)
ls_setup.pop_triangles(normal_limit=85, aperture_limit=90,
                       opening_limit=150)
ls_setup.show()
Face not pointing towards listener: [0 1 2]
Face not pointing towards listener: [0 6 2]
Face not pointing towards listener: [2 5 6]
Face not pointing towards listener: [2 3 4]
Face not pointing towards listener: [2 5 4]
No description has been provided for this image
In [26]:
spa.plot.decoder_performance(ls_setup, 'NLS')
No description has been provided for this image
In [27]:
spa.plot.decoder_performance(ls_setup, 'VBAP')
No description has been provided for this image
In [28]:
spa.plot.decoder_performance(ls_setup, 'SAD')
No description has been provided for this image
In [29]:
spa.plot.decoder_performance(ls_setup, 'MAD')
No description has been provided for this image
In [30]:
spa.plot.decoder_performance(ls_setup, 'EPAD', N_sph=2)
No description has been provided for this image
In [31]:
ls_setup.ambisonics_setup(update_hull=True)
spa.plot.decoder_performance(ls_setup, 'ALLRAD')
No description has been provided for this image
In [32]:
ls_setup.ambisonics_setup(update_hull=True)
spa.plot.decoder_performance(ls_setup, 'ALLRAD2')
No description has been provided for this image

Binaural Decoding¶

\begin{equation} s^{l,r}(t) = x(t) * h_{\mathrm{HRIR}}^{l,r}({\Omega}, t) \quad, \end{equation}

where $(*)$ denotes the time-domain convolution operation.

Transforming to the time-frequency domain through the time-domain Fourier transform, further assuming plane-wave components $\bar X (\Omega)$, the ear input signals are given as \begin{equation} S^{l,r}(\omega) = \int_{\Omega} \bar X (\Omega, \omega) H^{l,r}(\Omega, \omega) \,\mathrm{d}\Omega \quad. \end{equation}

\begin{equation} S^{l,r}(\omega) = \sum_{n = 0}^{N} \sum_{m = -n}^{+n} \chi_{nm}(\omega) \breve H_{nm}^{l,r}(\omega) \quad. \end{equation}

For one ear (left) this can be interpreted as a frequency dependent ambisonic beamformer \begin{equation} s^l(\omega) = \chi_{nm}(\omega) [\breve H_{nm}^{l}(\omega)]^T \quad . \end{equation}

In [ ]: